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Abstract 

Diffusion magnetic resonance imaging (dMRI) data allow to recon¬ 
struct the 3D pathways of axons within the white matter of the brain 
as a tractography. The analysis of tractographies has drawn attention 
from the machine learning and pattern recognition communities provid¬ 
ing novel challenges such as finding an appropriate representation space 
for the data. Many of the current learning algorithms require the input to 
be from a vectorial space. This requirement contrasts with the intrinsic 
nature of the tractography because its basic elements, called streamlines 
or tracks, have different lengths and different number of points and for this 
reason they cannot be directly represented in a common vectorial space. 
In this work we propose the adoption of the dissimilarity representation 
which is an Euclidean embedding technique defined by selecting a set of 
streamlines called prototypes and then mapping any new streamline to 
the vector of distances from prototypes. We investigate the degree of ap¬ 
proximation of this projection under different prototype selection policies 
and prototype set sizes in order to characterise its use on tractography 
data. Additionally we propose the use of a scalable approximation of the 
most effective prototype selection policy that provides fast and accurate 
dissimilarity approximations of complete tractographies. 


1 Introduction 

Deterministic tractography algorithms [S] can reconstruct white matter fiber 
tracts as a set of streamlines, also known as tracks, from diffusion Magnetic Res¬ 
onance Imaging (dMRI) [5] data. A streamline is a mathematical approximation 
of thousands of neuronal axons expressing anatomical connectivity between dif¬ 
ferent areas of the brain, see Figure [T] Recently there has been an increase of 
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Figure 1: A set of 100 streamlines, i.e. an example of prototypes, from a full 
tractography 


attention in analysing dMRI/tractography data by means of machine learning 
and pattern recognition methods, e.g. [Miiin]. These methods often require the 
data to lie in a vectorial space, which is not the case for streamlines. Streamlines 
are polylines in 3D space and have different lengths and numbers of points. The 
goal of this work is to investigate the features and limits of a specific Euclidean 
embedding, i.e. the dissimilarity representation, that was recently applied to 
the analysis of tractography data [9]. 

The dissimilarity representation is an Euclidean embedding technique de¬ 
fined by selecting a set of objects (e.g. a set of streamlines) called prototypes, 
and then by mapping any new object (e.g. any new streamline) to the vec¬ 
tor of distances from the prototypes. This representation [mill El is usually 
presented in the context of classification and clustering problems. It is a lossy 
transformation in the sense that some information is lost when projecting the 
data into the dissimilarity space. To the best of our knowledge this loss, i.e. the 
degree of approximation, has received little attention in the literature. In m 
the approximation was studied to decide among competing prototype selection 
policies only for classification tasks. In this work we are interested in assessing 
and controlling this loss without restriction to the classification scenario. 

This work is motivated by practical applications about executing common 
algorithms, like spatial queries, clustering or classification, on large collections 
of objects that do not have a natural vectorial space representation. The lack 
of the vectorial representation avoids the use of some of those algorithms and 
of computationally efficient implementations. The dissimilarity space represen¬ 
tation could be the way to provide such a vectorial representation and for this 
reason it is crucial to assess the degree of approximation introduced. Besides 
this characterisation we propose the use of a stochastic approximation of an op¬ 
timal algorithm for prototype selection that scales well on large datasets. This 
scalability issue is of primary importance for tractographies given that a full 
brain tractography is a large collection of streamlines, usually « 3 x 10^, a size 
for which algorithms may become impractical. We provide practical examples 
both from simulated data and human brain tractographies. 
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2 Methods 


In the following we present a concise formal description of the dissimilarity 
projection together with a notion of approximation to quantify how accurate 
this representation is. Additionally we introduce three strategies for prototype 
selection that will be compared in Section]^ 

2.1 The Dissimilarity Projection 

Let X be the space of the objects of interest, e.g. streamlines, and let A S A. 
Let Px be a probability distribution over X. Let d ■. X x X ^ R+ be a distance 
function between objects in X. Note that d is not assumed to be necessarily 
metric. Let 11 = {Xi, ..., Xp}, where Vi Xi G X and p is finite. We call each Xi 
as prototype or landmark. The dissimilarity representation / projection is defined 
as <(A) : A RP s.t. 

<(A) = [d( A, Ai),..., d(A,Ap)] (1) 

and maps an object A from its original space A to a vector of R^. 

Note that this representation is a lossy one in the sense that in general it is 
not possible to exactly reconstruct A from 0^(A) because some information is 
lost during the projection. 

We define the distance between projected objects as the Euclidean distance 
between them: An(A, A') = ||^n(^) ~ ^niX')\\ 2 , i.e. : A x A R+. It 
is intuitive that Af^ and d should be strongly related. In the following sections 
we will present more details and explanations about this relation. 

2.2 A Measure of Approximation 

We investigate the relationship between the distribution of distances among 
objects in A through d and the corresponding distances in the dissimilarity 
representation space through Ajlj. We claim that a good dissimilarity represen¬ 
tation must be able to accurately preserve the partial order of the distances, i.e. 
if d{X,X') < d(A, A") then A5^(A, A') < Af^(A, A") for each A, A', A" € A 
almost always. As a measure of the degree of approximation of the dissimilarity 
representation we define the Pearson correlation coefficient p between the two 
distances over all possible pairs of objects in A: 

^ ^ Cov(d(A,A0,A^^(A,A0) ^ 2 ) 

0'd(X.X')0-A^(X,X') 

where A, X' ~ Px ■ In practical cases Px is unknown and only a finite sample S 
is available. We can approximate p as the sample correlation r where A, X' G S. 
An accurate approximation of the relative distances between objects in A results 
in values of p far from zero and close to iQ 

^Note that negative correlation is not considered as accurate approximation. Moreover it 
never occurred during experiments 
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In the literature of the Euclidean embeddings of metric spaces, the term of 
distortion is used for representing the relation between the distances in the orig¬ 
inal space and the corresponding ones in the projected space. The embedding 
is said to have distortion^ c if for every x,x' € X: 

d{x,x') > Afi{x,x') > -d{x,x'). (3) 

c 

An interesting embedding of metric spaces is described in [7]. It is based on 
ideas similar to the dissimilarity representation and has the advantage of pro¬ 
viding a theoretical bound on the distortion. Unfortunately this embedding is 
computationally too expensive to be used in practice. 

We claim that correlation and distortion target slightly different aspects of 
the embedding quality, the first focussing on the averaged differences between 
the original and projected space and the second on the worst case scenario. 
For this reason we claim that, in the context of machine learning and pattern 
recognition applications, correlation is a more appropriate measure. 

2.3 Strategies for Prototype Selection 

The definition of the set of prototypes with the goal of minimising the loss of the 
dissimilarity projection is an open issue in the dissimilarity space representation 
literature. In the context of classification problems the policy of random selec¬ 
tion of the prototypes was proved to be useful under certain assumptions [T]. 
In the following we address the issue of choosing the prototypes in order to 
achieve the desired degree of approximation but we do not restrict to the clas¬ 
sification case only. We define and discuss the following policies for prototype 
selection: random selection, farthest first traversal (EFT) and subset farthest 
first (SEE). All these policies are parametric with respect to p, i.e. the number 
of prototypes. 

2.3.1 Random Selection 

In practical cases we have a sample of objects S = {Xi, ..., Xjv} C X. This 
selection policy draws uniformly at random from A, i.e. 11 C S' and |n| = p. 
Note that sampling is without replacement because identical prototypes provide 
redundant, i.e. useless, information. This policy was first proposed in ^ for 
seeding clustering algorithms. This policy has the lowest computational com¬ 
plexity 0(1). 

2.3.2 Farthest First Traversal (FFT) 

This policy selects an initial prototype at random from S and then each new one 
is defined as the farthest element of S from all previously chosen prototypes. The 
FFT policy is related to the k-center problem [^: given a set S and an integer 
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fc, what is the smallest e for which you can find an e-covei[^of S of size fc? 
The fc-center problem is known to be an NP-hard [H], i.e. no efficient algorithm 
can be devised that always returns the optimal answer. Nevertheless FFT is 
known to be close to the optimal solution, in the following sense: If T is the 
solution returned by FFT and T* is the optimal solution, then max^-gs d(x, T) < 
2maxa:es d{x^T*). Moreover, in metric spaces, any algorithm having a better 
ratio must be NP-hard [5]. FFT has 0(p|S'|) complexity. Unfortunately when 
151 becomes very large this prototype selection policy becomes impractical. 

2.3.3 Subset Farthest First (SFF) 

In the context of radial basis function networks initialisation, a scalable approx¬ 
imation of the FFT algorithm, called subset farthest first (SFF), was proposed 
in [12j . This approximation is also claimed to reduce the chances to select out¬ 
liers that can lead to a poor representation of large datasets. The SFF policy 
samples m = \cp log p~\ points from S uniformly at random and then applies 
FFT on this sample in order to select the p prototypes. In [T^] it was proved 
that under the hypothesis of p clusters in 5, the probability of not having a 
representative of some clusters in the sample is at most The computa¬ 

tional complexity of SFF is 0{p^ logp). Note that for large datasets and small p 
this prototype selection policy has a much lower computational cost than FFT. 


3 Experiments 

In the following we describe the assessment of the degree of approximation of 
the dissimilarity representation across different prototype selection policies and 
different numbers of prototypes. The aim is to investigate the trade-off between 
accuracy and computational cost. The experiments are carried out on 2D sim¬ 
ulated data and on real tractographies reconstructed from dMRI recordings of 
the human brain. 


3.1 Simulated Data 


LetX = m.^,Px = AA(/x,E), /X = [0,0],E = /, d{X,X') = ||X-X'|| 2 , p = 3 and 
ll,l2,X3-Px. Then<(X)= [||X-Xi||2,||X-l2||2,||lf-^3||2] 

Figure shows a sample of 50 points drawn from Px together with the 3 pro¬ 
totypes Xi,X 2 ,X 3 . Figure]^ shows the sample projected into the dissimilarity 
space together with the prototypes. 

The selection of the prototypes according to different policies is explained in 
Section 2.3 For SFF we chose c = 3 in order to have high probability (> 0.95) 
of accurately representing S through the subset. Each dataset was projected 


^ Given a metric space d), for any e > 0, an e-cover of a set S C is defined to be any 
set Tax such that d{x,T) < e,Va; S S. Here d{x,T) is the distance from point x to the 
closest point in set T. 

^Note that in our problem k is called p. 
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Figure 2: A 2-dimensional example of 50 points (black circles) drawn from 
Af{0,T) and 3 prototypes (red stars) drawn from the same pdf. 
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Figure 3: The dissimilarity projection of the dataset and prototypes of Figurej^ 
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Figure 4: Average correlation between d and across different prototype 
selection policies and different numbers of prototypes. 


in the dissimilarity space. The correlation p between distances in the original 
space and the corresponding distances in the projected space was estimated 
by computing 50 repetitions of the simulated dataset. The average correlation 
and one standard deviation for each prototype selection strategy are shown in 
Figure]^ 

In this simulated dataset both SFF and FFT performed significantly better 
than the random selection, on average. FFT showed a small advantage over 
SFF when p < 10. 

3.2 Tractography data 

We estimated the dissimilarity representation over tractography data from dMRI 
recordings of the MRI facility at the MRC Cognition and Brain Sciences Unit, 
Cambridge UK. The dataset consisted of 12 healthy subjects; 101 (+1, i.e. 
b — 0) gradients; 6-values from 0 to 4000; voxel size: 2.5 x 2.5 x 2.5mm^. In or¬ 
der to get the tractography we computed the single tensor reconstruction (DTI) 
and created the streamlines using EuDX, a deterministic tracking algorithm 
from the DiPy library]^ We obtained two tractographies using 10^ and 3 x 10® 
random seed respectively. The first tractography consisted of approximately 10® 
streamlines and the second one of 3 x 10® streamlines. An example of a set of 
prototypes from the largest tractography is shown in Figure [l] 

As the distance between streamlines we chose one of the most common, 
i.e. the symmetric minimum average distance from m defined as d{Xa,Xb) = 
^{S{Xa,Xb) + S{Xb,Xa)) where 


S{Xa,Xb) 


1 


E 


min 


-y|| 2 . 


^ http://www.dipy.org 


(4) 
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Figure 5: The correlation between of d and over a 10^ streamlines tractog- 
raphy for different prototype selection policies. 


As it is shown in Figure for the case of a tractography of 10^ streamlines 
both FFT and SFF (c = 3) had significantly higher correlation than the random 
sampling for all numbers of prototypes considered. We confirmed that the SFF 
selection policy is an accurate approximation of the FFT policy for tractogra- 
phies. Moreover we noted that after 15 — 20 prototypes the correlation reaches 
approximately 0.95 on average (50 repetitions) and then slightly decreases in¬ 
dicating that a little number of prototypes is sufficient to reach a very accurate 
dissimilarity representation. 

Figure shows the correlation for SFF and the random policy when the 
tractography has 3 x 10^ streamlines, i.e. the standard size of a tractography 
from current dMRI recording techniques. In this case FFT is impractical to be 
computed because it requires approximately 15 minutes on a standard desktop 
computer for a single repetition when p = 50. The cost of computing SFF 
is instead the same of the case of 10^ streamlines, as its computational cost 
depends only on the number of prototypes. It took « 2 seconds on standard 
desktop computer when p = 50 to compute one repetition. We observed that 
for 3 X 10^ streamlines SFF significantly outperformed the random policy and 
reached the highest correlation of 0.96 on average (50 repetitions) for 15 — 25 
prototypes. 

Note that the figures presented in this section refers to data from subject 
1 of the dMRI dataset. We conducted the same experiments on other sub¬ 
jects obtaining equivalent results. The code to reproduce all the experiments is 
available at https://github.com/emanuele/prni2012_dissimilarity under 
an open source license. 
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Figure 6: The correlation between of d and for a full tractography of 3 x 10^ 
streamlines for the random and SFF prototype selection policies. 

4 Discussion 

In this document we investigated the degree of approximation of the dissimilarity 
representation for the goal of preserving the relative distances between stream¬ 
lines within tractographies. Empirical assessment has been conducted on two 
different datasets and through various prototype selection methods. All of the 
results from both simulated data and real tractography data reached correlation 
> 0.95 with respect to the distances in the original space. This fact proved that 
the dissimilarity representation works well for preserving the relative distances. 
Moreover on tractography data the maximum correlation was reached with just 
approximately 20 — 25 prototypes proving that the dissimilarity representation 
can produce compact feature spaces for this kind of data. 

When comparing the different prototype selection policies we found that 
FFT had a small advantage over SFF but only when the number of prototypes 
was very low (p < 10). Both FFT and SFF always outperformed the random 
policy. Moreover, since the computational cost of SFF does not increase with 
the size of the dataset but only with the number of prototypes, we observed 
that the SFF policy can be easily computed on a standard computer even in 
the case of a tractography of 3 x 10® streamlines. This is different from FFT 
which is several orders of magnitude slower than SFF, thus computationally less 
practical. 

We advocate the use of the dissimilarity approximation for the Euclidean 
embedding of tractography data in machine learning and pattern recognition 
applications. Moreover we strongly suggest the use of the SEE policy to obtain 
an efficient and effective selection of the prototypes. 
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